A recursive algorithm for the forced alignment of very long audio segments

نویسندگان

  • Pedro J. Moreno
  • Christopher F. Joerg
  • Jean-Manuel Van Thong
  • Oren Glickman
چکیده

In this paper we address the problem of aligning very long (often more than one hour) audio files to their corresponding textual transcripts in an effective manner. We present an efficient recursive technique to solve this problem that works well even on noisy speech signals. The key idea of this algorithm is to turn the forced alignment problem into a recursive speech recognition problem with a gradually restricting dictionary and language model. The algorithm is tolerant to acoustic noise and errors or gaps in the text transcript or audio tracks. We report experimental results on a 3 hour audio file containing TV and radio broadcasts. We will show accurate alignments on speech under a variety of real acoustic conditions such as speech over music and speech over telephone lines. We also report results when the same audio stream has been corrupted with white additive noise or compressed using a popular web encoding format such as RealAudio. This algorithm has been used in our internal multimedia indexing project. It has processed more than 200 hours of audio from varied sources, such as WGBH NOVA documentaries and NPR web audio files. The system aligns speech media content in about one to five times realtime, depending on the acoustic conditions of the audio signal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Forced Alignment Under Adverse Conditions

The problem of forced alignment is that of matching phonetic segments in an audio sample to its corresponding transcription, which is a vital part of indexing audio files. While various methods have been employed to accomplish this task, the results become less accurate under adverse alignment conditions caused by various disturbances in the audio as well as transcription errors. In fact, the a...

متن کامل

A real-time recursive dynamic model for vehicle driving simulators

This paper presents the Real-Time Recursive Dynamics (RTRD) model that is developed for driving simulators. The model could be implemented in the Driving Simulator. The RTRD can also be used for off-line high-speed dynamics analysis, compared with commercial multibody dynamics codes, to speed up mechanical design process. An overview of RTRD is presented in the paper. Basic models for specific ...

متن کامل

A Framework for Conversational Arabic Speech Long Audio Alignment

We propose a framework for long audio alignment for conversational Arabic speech. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. In this work, we have collected more than 1400 hours of conversational Arabic besides the corresponding non-aligned text transcriptions. Automatic ...

متن کامل

Forced alignment for speech synthesis databases using duration and prosodic phrase breaks

Alignment of text to recorded audio is limited by the fact that standard techniques do not handle very long utterances well. This work presents a model for segmenting long recordings into smaller utterances. Our approach differs from typical forced alignment techniques in that prosodic phrase break locations are first estimated, and then words are placed around breaks based on length and break ...

متن کامل

Handling large audio files in audio books for building synthetic voices

One of the issues in using audio books for building a synthetic voice is the segmentation of large audio files. The use of standard forced-alignment to obtain phone boundaries on large audio files fails primarily because of huge memory requirements. Earlier works have attempted to resolve this problem by using large vocabulary speech recognition system employing restricted dictionary and langua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998